CoVarNet is a computational framework aiming to unravel the coordination among multiple cell types by analyzing the covariance in the frequencies of cell types across various samples.
CoVarNet has generalizability to be applied to external datasets to recover cellular modules. This tutorial shows how to use CoVarNet method to recover co-occurring cellular modules in scRNA-seq data and spatial transcriptomics data. Notably, the recovery of cellular modules requires CM composition file as a reference, derived as described in Tutorial 1. We provide an example reference file data/network.rds, which is defined from the comprehensive pan-tissue cell atlas. Also, users can establish a reference file on user-defined CMs from their own data.
library(CoVarNet)
The R packages listed below are required for running CoVarNet. The
version numbers indicate the package versions used for testing the
CoVarNet code. Other R versions might work too.
Before the recovery, users should initially annotate the broad cell types and cell subsets of each cell, and then prepare the cell type annotation file. This file should contain at least three columns: “sampleID”, “majorCluster” which contains the broad cell types, “subCluster” which contains the cell subsets. Here we use a subset (100 randomly selected samples) of the pan-tissue cell atlas as an example, available in data/sc_recover_example.rds. The reference file is available in data/network.rds.
meta<-readRDS("./data/example_recovery.rds")
meta[1:5,1:3]
## sampleID majorCluster subCluster
## D01-TGGGCGTCATCGGACC-1-marrow BoneMarrow B B01_Bn_IGHM
## D01-TCAGCTCTCCTTGGTC-1-marrow BoneMarrow B B01_Bn_IGHM
## D01-TGCGTGGCAGATAATG-1-marrow BoneMarrow B B01_Bn_IGHM
## D01-TAGCCGGGTGCAGACA-1-marrow BoneMarrow B B01_Bn_IGHM
## D01-CCGTGGATCGTCCGTT-1-marrow BoneMarrow B B01_Bn_IGHM
The annotation of cell subsets should be consistent with the reference file.
ref=readRDS("data/network.rds")
identical(sort(unique(meta$subCluster)),sort(rownames(basis(ref$raw))))
## [1] TRUE
In accordance with Tutorial 1, filter the cell annotation file and obtain the normalized frequency matrix.
meta <- meta[meta$cellSort %in% c("Mix", "Total"), ]
rt_sp <- names(table(meta$sampleID))[table(meta$sampleID) >= 100]
meta <- meta[meta$sampleID %in% rt_sp, ]
mat_fq_raw<-freq_calculate(meta)
## `mutate_if()` ignored the following grouping variables:
## • Column `majorCluster`
mat_fq_norm<-freq_normalize(mat_fq_raw,normalize="minmax")
The command line is:
mat_cm<-sc_cm_recover(ref,mat_fq_norm)
The output of this command is H: the activity of CMs in each sample.
mat_cm[1:12,1:3]
## 145 621B-BoneMarrow 640C-LymphNode
## CM01 2.302459e-02 2.220446e-16 6.816797e-02
## CM02 2.220446e-16 1.472005e+00 9.169505e-02
## CM03 2.220446e-16 2.220446e-16 3.602602e-01
## CM04 2.220446e-16 3.173015e-01 2.502810e-01
## CM05 2.220446e-16 1.455042e+00 6.269249e+00
## CM06 2.220446e-16 5.516095e-01 1.318212e-01
## CM07 1.160867e+00 2.220446e-16 2.220446e-16
## CM08 2.220446e-16 2.220446e-16 2.220446e-16
## CM09 2.220446e-16 1.494245e+00 5.764557e-01
## CM10 2.220446e-16 2.220446e-16 2.220446e-16
## CM11 2.220446e-16 1.525283e-01 6.039352e-02
## CM12 2.602212e-02 2.220446e-16 2.220446e-16
Categorize all samples into exclusive CM types.
max_cm <- apply(mat_cm, 2, function(x) rownames(mat_cm)[which.max(x)])
max_cm<-gsub("CM", "CMT",max_cm)
max_cm[1:10]
## 145 621B-BoneMarrow 640C-LymphNode 640C-Spleen
## "CMT07" "CMT09" "CMT05" "CMT05"
## A16_TH_TOT_3 A16_TH_TOT_5 A16_TH_TOT_5GEX_4 A16_TH_TOT_7
## "CMT09" "CMT09" "CMT09" "CMT09"
## A26-TCL-0-SC-1 A32-MLN-0-SC-1
## "CMT03" "CMT09"
The heatmap displays CMs activity profiles across tissues. Each sample is assigned a dominant CM, and all samples are ordered by their CM identities and tissue types.
gr.distribution(mat_cm,meta=meta,group="tissue")
We posited that distinct cell subsets within each CM are spatially
orchestrated to respond collectively. Thus, we map the CMs to spatially
resolved transcriptomics data to investigate the spatial characteristics
of CMs.
Here we use ileum data generated by 10X Visium as an example
(presented in Bergenstråhle et al. Nature Biotechnology).
Before the recovery, we use cell2location to estimate the cell abundance
per cell subset in each spot. Similar to the requirement in the recovery
of scRNA-seq data, The annotation of cell subsets should be consistent
with the reference file data/network.rds. Four example data after
deconvolution are available in data/Ileum_1.rds, data/Ileum_2.rds,
data/Ileum_3.rds, data/Ileum_4.rds.
For users who use their own
spatial data to recover cellular modules, other annotation methods might
work too.
spe=readRDS("./data/Ileum_1.rds")
ref=readRDS("data/network.rds")
spe@meta.data[,rownames(basis(ref$raw))][1:5,1:3]
## B01_Bn_IGHM B02_Bn_TCL1A B03_Bn_NR4A2
## AAACAAGTATCTCCCA-1 0.007082924 0.03670086 0.004070013
## AAACACCAATAACTGC-1 0.008483844 0.04575415 0.005342734
## AAACAGAGCGACTCCT-1 0.049516937 0.24134531 0.013165244
## AAACAGCTTTCAGAAG-1 0.100642546 0.72028510 0.041941236
## AAACAGGGTCTATATT-1 0.019476875 0.11386335 0.012350037
We can use the same method to predict the activity of CMs in each spot through the function sc_cm_recover. Subsequently, visualize the spatial distribution of cellular modules using Seurat::SpatialFeaturePlot.
mat_fq_norm<-freq_normalize_st(spe,ref,normalize="minmax")
## `mutate_if()` ignored the following grouping variables:
## • Column `majorCluster`
mat_cm<-sc_cm_recover(ref,mat_fq_norm)
spe=AddMetaData(object = spe,
metadata = t(mat_cm),
col.name = rownames(mat_cm))
SpatialFeaturePlot(spe,features="CM03")
Additionally, we construct CM networks by interconnecting
co-occurring nodes through specifically correlated edges in Tutorial 1,
and thus obtain the cell subset components of each CM. With this
specific cell compositions, we can also recover CMs by calculating the
weighted sum of cell subset abundance in each spot.
spe=readRDS("./data/Ileum_1.rds")
spe=st_cm_recover(ref$filter,spe)
## No id variables; using all as measure variables
p1<-SpatialFeaturePlot(spe,features="CM03",pt.size.factor = 0)
p2<-SpatialFeaturePlot(spe,features=c("CM03","B12_Plasma_IGHA2","CD8T03_Trm_ITGA1","S07_Fb_TCF21","CD4T06_Tm_ANXA1","S20_Glia_CDH19"),ncol=3)
p3<-SpatialFeaturePlot(spe,features=c("CM05","CD8T02_Tem_GZMK","CD4T03_Tn_NR4A1","B03_Bn_NR4A2","B05_Bm_NR4A2","CD4T04_Tfh_IL6ST"),ncol=3)
H&E staining image of ileum_1
p1
p2
p3
The cell subsets of CM03 and
CM05 exhibited notable colocalization respectively, with CM03 enriched
in the intestinal mucosa and CM05 enriched in the Peyer’s patch region.
The heatmap quantifies spatial proximity among the cell subsets
of CM03 and CM05. Here We calculate Global Bivariate Moran’s I (using
spdep R package) between cell subsets on a single spatial slide.
coords<-spe@images$slice1@boundaries$centroids@coords
subsets=colnames(spe@meta.data)[4:79]
st_fq<-spe@meta.data[,subsets]
gr.proximity(coords,st_fq,ref,c("CM03","CM05"))
## Registered S3 methods overwritten by 'proxy':
## method from
## print.registry_field registry
## print.registry_entry registry
To evaluate the spatial colocalization of CM components in spatial transcriptomics, we calculated Spearman correlation coefficients for each pair of cell-subset components across spots of a single slide. The initial colocalization score for a CM was defined as the median of these correlation coefficients. Additionally, a background score was determined as the median of correlation coefficients for pairs, where one cell subset was part of the CM and the other was not. The final colocalization score was ascertained by subtracting the background score from the initial.
spe1=readRDS("./data/Ileum_1.rds")
st_fq<-spe1@meta.data[,subsets]
col=colocalization(st_fq,ref)
col
## cm score
## 1 CM01 0.4896552
## 2 CM02 0.6950355
## 3 CM03 0.6171329
## 4 CM04 0.6794326
## 5 CM05 0.6936170
## 6 CM06 0.7104851
## 7 CM07 0.5041096
## 8 CM08 0.6666667
## 9 CM09 0.8283688
## 10 CM10 0.6821918
## 11 CM11 0.6114943
## 12 CM12 0.5416667
Calculate the colocalization of CMs in multiple samples. The highest
concordance was observed in CMs with a higher proportion of
lymphocytes.
To assess the regional aggregation of cell-subset components within CMs in spatial transcriptomics, we utilized the global bivariate Moran’s I using the spdep R package. The initial aggregation score for each CM was determined by calculating the median Moran’s I values across all cell-subset component pairs within the CM. A background score was similarly derived from the median Moran’s I values of pairs consisting of one cell subset from the CM and one from outside. The final aggregation score was calculated by subtracting the background score from the initial.
coords<-spe1@images$slice1@boundaries$centroids@coords
bvM=bvMoranI(coords,st_fq,ref)
bvM
## cm score
## 1 CM01 0.5655172
## 2 CM02 0.6815603
## 3 CM03 0.7019231
## 4 CM04 0.7602837
## 5 CM05 0.7028369
## 6 CM06 0.7527387
## 7 CM07 0.5534247
## 8 CM08 0.6377152
## 9 CM09 0.8007092
## 10 CM10 0.7383562
## 11 CM11 0.5390805
## 12 CM12 0.5406746
Calculate the aggregation of CMs in multiple samples. Consistent with
the previous conclusion, the highest concordance was observed in CMs
with a higher proportion of lymphocytes.